Align target sequence to selected transcript on protein level#79
Align target sequence to selected transcript on protein level#79sallybg wants to merge 2 commits intomavedb-devfrom
Conversation
bencap
left a comment
There was a problem hiding this comment.
Thanks Sally, looks like a really nice improvement to the mapping logic and it's resolving the cases we identified soundly. There's just one potential issue we should resolve before merging.
| For protein annotations, these strings must be adjusted to match the offset defined by the start of the | ||
| transcript sequence. For genomic annotations, these strings must be adjusted to match the coordinates of |
There was a problem hiding this comment.
We should adjust this documentation to match the new logic of aligned offsets.
| protein_align_result = align_target_to_protein( | ||
| sequence, transcript.sequence, silent | ||
| ) |
There was a problem hiding this comment.
We should add a type guard here isinstance(transcript, TxSelectError) like we have before accessing any properties of transcript in the loop below. If we attempt to access transcript.sequence while the transcript is of type TxSelectError, we'll get an AttributeError.
I think it's enough to just guard it and then continue to the loop if it is of the wrong error type, so that we get per-row handling of error messages.
| protein_align_result = align_target_to_protein( | |
| sequence, transcript.sequence, silent | |
| ) | |
| protein_align_result = None | |
| if isinstance(transcript, TxSelectResult): | |
| protein_align_result = align_target_to_protein( | |
| sequence, transcript.sequence, silent | |
| ) |
| @@ -767,7 +814,7 @@ | |||
| else: | |||
| if _hgvs_pro_is_valid(row.hgvs_pro): | |||
There was a problem hiding this comment.
Given the comment preceding this one, we'd I think also want to guard against a NoneType protein_align_result here. I'm not convinced that can actually occur in practice (Often in this code base things are typed as Unions, but the types are heavily correlated such that if one object is type X, another must be type Y and not Z) , but it's a good guard to have.
I think transcript is fine typed as is given my notion on the correlated types and since it is pre-exists this change.
For protein-coding, non-reference variants, align protein-level target sequence to protein-level selected transcript sequence, and use alignment information to generate mappings. This handles within-sequence offsets between the target and selected transcript, which was not handled previously.